The MultiBoson method 



X 



Philippe de Forcrand 1 

ETH, CH-8092 Zurich, Switzerland 



Abstract 



5— i 

This review describes the multiboson algorithm for Monte Carlo simulations of 
lattice QCD, including its static and dynamical aspects, and presents a comparison 
with Hybrid Monte Carlo. 
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Q\ . 1 The MultiBoson approach in a nutshell 

On 

Monte Carlo simulations of lattice QCD aim at sampling the partition function 

<D . N f 



JV f 

Z = JI[dU x ,,e- s ^ u » l[det(l/)({U}) + 



where U X}IA are gauge links, S g ({U}) is the local gauge action (e.g. plaquette), 
Tp ({[/}) is the local discretized Dirac operator, and the fermion (quark) of 
flavor i has mass m;, i = 1, ..,Nf. The determinant results from the integra- 
tion over the anti-commuting fermion fields. It couples all gauge links U to 
each other. Such a non-local interaction implies that, in order to perform the 
Monte Carlo update of a single gauge link U x ^ — > U' x , one must compute its 
interaction with all other links, which represents an amount of work at least 
0(V), where V is the lattice volume. Updating all links costs 0(V 2 ) or more, 
in contrast to the case of a local interaction like S g , where the cost is simply 
0{V). Therein lies the difficulty of simulating dynamical fermions. 

Progress in this old problem came with the introduction of an auxiliary boson 
field tp and the use of the Gaussian integral representation, appropriate for the 
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case of two flavors degenerate in mass: 

|det(# + m)| 2 = J [#*][#] e -l^+ m )-Vl 2 _ ( 2 ) 

This identity is at the core of the Hybrid Monte Carlo (HMC) method [1]. 
One alternates updates of {U} and {</)}. However, the effective action \(ID + 
m) _1 0| 2 is still non-local because (ID + m) _1 is. The idea is that the work 
0(V) previously necessary to update one link U Xtfl by an arbitrary step can 
now be expended to update all links U by an infinitesimal (in fact, very small) 
step (see [1] for details). 

To get rid of non- locality altogether, Luscher proposed [2] to turn Eq.(2) 
"upside-down" : 

|det(# +m- z k )\- 2 = J [#*][#] e -\W +m ~ z ^ 2 , (3) 

which is true for any complex number z k outside the spectrum of (Jp + to). 
The action in the right-hand side is now local. Its range is twice that of Jp : <p x 
interacts at most with its next-nearest neighbors for the standard discretiza- 
tion of Jp (Wilson or staggered). On the left-hand side, one can approximate 
the desired |det(^> +to)| 2 by considering the product of factors Eq.(3) with 
different z k s, each with its own auxiliary boson field (f) k . If one chooses a 
polynomial P n (x) = c n Y[ k =i( x — z k ) which approximates 1/x over the whole 
spectrum of (JD + to), then P n (JD + to) [Jp + to) -1 , and 

|det(^ +m)| 2 w \det- 1 P n (ip +m)| 2 = c^ni det_1 (^ + m - z k )\ 2 

k 




This equation summarizes the MultiBoson (MB) method, with effective action 

n 

s = Y.\i I P+ m - z k)M 2 ■ (5) 
k=i 

Each term in this action is most sensitive to fluctuations in the Dirac spectrum 
nearest z k , so that one may view each boson field (fi k as "dedicated" to the 
control of a particular region (IR, UV, ...) of the spectrum. Note that the 
discrete sum over k can be seen as an approximation to an integral in a fifth 
dimension. It is interesting that one recovers a very similar effective action 
when bosonizing continuum QCD, and then discretizing [3]. 
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The MB effective action is local, but it involves n auxiliary boson fields <pk- 
It therefore now takes work 0(n) to update one link U by an arbitrary step. 
Larger n means more work, but also a more accurate approximation of det(Ip + 
m)\ 2 . One clearly wants to choose the polynomial P n so as to minimize n 
while preserving sufficient accuracy. This is the static setup of the MultiBoson 
method, reviewed in Sec. 2. For the actual MB simulation, one must also 
arrange the update of all the gauge and boson fields, so as to explore phase 
space with minimum computer effort. These MB dynamics issues are reviewed 
in Sec. 3. A somewhat subjective assessment of the MultiBoson method and 
a comparison with HMC follows in Sec. 4. 

In the rest of the paper, the Wilson discretization of the Dirac operator is nor- 
mally used, so that [Jp + to) is replaced by (1 — kM), where k is the hopping 
parameter and M the hopping matrix. But the staggered (Kogut-Susskind) 
Dirac discretization could be used just as well. In fact, the polynomial approx- 
imation which forms the basic idea of the MB method is quite general. One 
may construct the polynomial P n (x) to approximate an arbitrary function of 
x, like l/\/x or x~ l l 4 . This has led to simulations of QCD with 1 flavor [4] or 
oi N — 1 supersymmetric theory [5], respectively. The MB method has even 
been used in a particularly difficult context, to simulate the (2 + l)d Hubbard 
model at half-filling [6]. All these projects are summarized chronologically in 
Table I, which tries to sort out the many avatars of the MB method reviewed 
next. 



2 Statics 



2.1 How to choose the approximating polynomial? 



No less than four proposals have been made for selecting the polynomial 
P n (JP + to). The reason is that what constitutes an "optimal" polynomial 
has been a rather subjective issue until the most recent proposal. The succes- 
sive proposals have tried to reduce the degree n of the polynomial for a given 
"accuracy." The general idea is that n can be reduced if one knows more 
about the spectral properties of the Dirac matrix. The trade-off is that more 
adaptive methods do not have the aesthetic appeal and the clarity of analytic 
results. We review these proposals in chronological order, which is also that 
of increasing accuracy. 
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Table 1 

Chronological summary of published MultiBoson simulations 



Ref. 


Theory 


MultiBoson variant: 


Fermion mass: 


Measured 


and 








light (LL) to 


improvement 


Year 




exact? 


non-Herm.? 


A/3/0? 


heavy (HH) 


over HMC 


[7] 1994 


SU(2),N f = 2 








LH 


— 


[8] 1994 


SU{2),N f = 2 








LH 


— 


[9] 1994 


QED 2 ,N f = 2 


X* 






LH 


— 


[10] 1995 


QCD,N f = 2 


X* 






H 


— 


[6] 1995-97 


(2 + l)d Hubbard 








"LL"t 


— 


[11] 1995 


QCD, N f = 2 


X 


X 




H 


0(1) 


[5] 1995-99 


N = 1 SUSY 


X 






LH 


— 


[12] 1995 


QCD, Nf = 2 








LH 


— 


[13] 1995 


QCD, Nf = 2 








H 


<1* 


[14] 1996 


QCD, Nf = 2 


X 


X 




H 


— 


[15] 1996 


QED 2 ,N f = 2 








LH 


— 


[16] 1996 


QCD, Nf = 2 


X 


X 




L 


0(1) 


[4] 1996-98 


QCD, TV/ = 1 


X 


X 




HH 


0(10) 


[17] 1996 


SU{2),N f = 2 








LH 


0(1)* 


[18] 1997 


QCD, iV> = 2 


X 


X 




LH 


<1 


[19] 1997 


QED 2 ,N f = 2 


X 






LH 


2-3 


[20] 1998 


QCD, Nf = 2 


X 


X 


X 


LH 


0(5) 



* The correction factor was computed exactly by Lanczos. 

t The condition number of the fermion matrix is very large ~ e@. 

t Comparison with the Kramers algorithm. 

2.1.1 Hermitian Chebyshev polynomial 

The original proposal of Liischer [2] is to work with the hermitian matrix 

(1 - kM) 



Q = 7s- 



(1 + 8/0 



(6) 



where the normalization ensures that the spectrum {A} is in [— 1,+1]. For 
Nf = 2 flavors, the approximating polynomial P n (Q 2 ) ~ ^ can be chosen as 
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that which minimizes the maximum relative error 



i?(A 2 ) 



Pn{\ 2 ) - 1/A 2 

1/A 2 



A 2 P n (A 2 ) - 1 



(7) 



in absolute value over the interval A 2 



G [e, 1]. This polynomial is 



Pn(x) 




i ( 2x _ 1+e 
n+Hi-e i- £ 



) 



(8) 



where T n (x) is the nth-order Chebyshev polynomial [T n (cos(6)) = cos(n$)}. 
The great advantage of this choice is that an analytic upper bound on the 
error R{\ 2 ) is known: 



Therefore, it is sufficient to monitor the smallest eigenvalue of Q 2 . One can 
then choose e and n so as to guarantee any prescribed accuracy. 

The main drawbacks of this choice are: (i) the upper bound of +1 for the 
spectrum of Q 2 is very conservative (it is reached in the free field only), so 
that the approximation extends uselessly in a region where it is not needed 
[this can easily be fixed by also monitoring the largest eigenvalue of Q 2 and 
changing the normalization in Eq.(6) accordingly]; (ii) it deals with 75^, 
which is more difficult to invert than Tp itself, as has been shown many times 
in the calculation of the quark propagator (BiCGStab is more efficient than 



2.1.2 Non-hermitian Chebyshev polynomial 

A rigorous bound of the type Eq.(9) can also be obtained over an elliptic 
domain in the complex plane. In particular, one can try to enclose the spectrum 
{A} of the Wilson-Dirac operator (1 — kM) inside an ellipse centered at (1, 0). 
If the major and minor semi-axes are a < 1 and b, then the relative error 
|AP„(A) — 1 1 is bounded by 



A simple ansatz is to take a = b. Then the zeroes of P n are equally spaced 
around the circle of radius 1 centered at (1,0). An example is shown in Fig. la, 




(9) 



CG; see, e.g., [21]). 




(10) 
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where the boundary of the Dirac spectrum was estimated from the eigenvalues 
of the tridiagonal matrix obtained via BiCGStab. 

This approximation converges better than the hermitian one of Sec. 2.1.1 [22]. 
Its main drawbacks are: (i) it is rather unusual (but not difficult) to monitor 
the whole complex boundary of the Dirac spectrum instead of just its smallest 
eigenvalue; (ii) as in Sec. 2.1.1, the same weight is given to errors anywhere in 
the spectrum, whereas the density of eigenvalues and the impact of an error 
vary greatly over the spectrum. 



2.1.3 Least- squares polynomial 



Instead of minimizing the maximum relative error, i.e., its infinite norm, an- 
other norm can be chosen. The Euclidean norm ||i?||2 has the advantage of 
leading to a quadratic minimization problem. Once the approximation inter- 
val (e.g., [e, 1]) has been selected, the coefficients of polynomial P n can be 
obtained straightforwardly by solving a least-squares problem [23]. There is 
little additional complication if the domain of approximation is a rectangle in 
the complex plane: [xi, x?\ x [y 1 ,y 2 \. Besides approximations of 1/x, other poly- 
nomial approximations like 1 / ^fx can easily be constructed. In principle, the 
weight function for the error can be varied as desired over the approximation 
domain. Therefore, this scheme offers good flexibility. However, it highlights 
the arbitrariness of what one really tries to optimize. 



2.1.4 Adapted polynomial 

This issue of arbitrariness is addressed in [20]. What one wants is to make the 
average correction factor (det 2 {Ip +m) P n {JP +m)) as close to 1 as possible. 
This correction factor, or rather its inverse, can be expressed as a Gaussian 
integral 

def 2 W = IM^^I = (e-I^W) , 



where W = (Jp +m) P n {JP +m) and r\ is a Gaussian random vector. Since the 
correction factor must be close to 1, the Taylor expansion of the exponential 
above may be truncated to its first term, yielding 

det 2 W- 1 « (\Wr]\ 2 - (12) 



to be minimized. In [20], this requirement is replaced by the sufficient condi- 
tion || Wrj — i] \\ 2 m V 77, and P n is found as the polynomial which minimizes 
(|| Wrj — rj H 2 )^, by quadratic minimization, for a fixed set of Gaussian vectors 
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r). The resulting polynomial depends little on the sample of Gaussian vectors, 
so that it appears to be adequate to consider only one vector rj. Moreover, [20] 
considers only one representative background gauge configuration on which 
to perform the minimization. In principle, some subtle averaging over gauge 
configurations should be performed; in practice, different equilibrium gauge 
fields yield almost identical polynomials. This lack of averaging seems to be 
the only drawback of this approach, which otherwise takes into account the 
complete spectral properties of Jp , not just its extreme eigenvalues or its spec- 
tral boundary. An illustrative example is shown in Fig. lb. With n — 16 the 
error is as small as in the approach of Sec. 2.1.2 with n = 20. 

All four possibilities reviewed here for choosing P n converge exponentially: 
the relative error decreases as e~ cn , where c depends on the choice Sec. 2.1.1— 
4, but goes to zero as m, the quark mass. This implies that, for a constant 
accuracy, the number of fields n should grow as 1/m. 

2.2 How to make the algorithm exact? 

Sampling with the measure oc |det _1 P n {Jp +m)\ 2 is an approximation, which 
deviates from the exact measure by a factor C = |det (Jp + m) P n (Jp +m)\ 2 . 
Several proposals have been made to make the sampling exact. 

2. 2. 1 Correction factor 

Liischer [2] originally suggested that the factor C could be incorporated as a 
correction in the observable: 

v*J) exact (C) ' 

(13) 

where the right-hand side involves averages over the approximate distribution. 
The advantage is that C only needs to be computed when a measurement is 
taken, not at every MC step. Sampling can be considered sub-optimal, since 
it is performed with respect to the approximate measure; however, this can 
sometimes be turned into an advantage, as for instance when over-sampling of 
Dirac near- zero modes is desired. In any case, the main problem is to compute 
C. 

2.2.2 Lanczos and Metropolis 

This problem was tackled in [9,10]. Since \/C = TTj \ P n (\), this correction 
factor can be calculated through a Lanczos process which produces all eigen- 
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values A, of [If) +m). The error is typically dominated by the outer (especially 
the smaller) eigenvalues, which are determined first, and so the Lanczos pro- 
cess can be stopped early. The resulting correction C was used in [9,10] to 
perform an accept/reject step and restore importance sampling with respect 
to the exact measure as follows: 

(1) Starting from {U, <p}, perform a reversible sequence of MC steps to obtain 
{U',<fi'}; this sequence is called a trajectory, by analogy with HMC. 

(2) Accept {[/', <$'} with probability mm(l, C'/C); otherwise, include {U, 0} 
in the Markov sequence again. 

Reversibility in step (1) means that the probability of choosing the reverse 
sequence of local MC updates is equal to that of choosing the original sequence. 
This condition is sufficient to ensure detailed balance of the move {U, 0} — > 
{U ',(/)'} with respect to the approximate MB action Eq.(5) [14]. 

As the lattice volume V is increased, the number of eigenvalues contributing 
to C increases like V. To keep the acceptance constant, the relative accu- 
racy of the polynomial approximation must increase accordingly. Since the 
approximation converges exponentially in n, this implies that n oc log V . 

2.2.3 Stochastic Metropolis 

In the acceptance test above, C'/C is only compared with a random num- 
ber. Therefore, it should not be necessary to compute this factor exactly: 
an estimate should suffice. As in Eq.(ll), one may readily see that ^ = 
( e -\w>- l w v \ 2 +\ v \ 2 }^ where w = {ip +m) P n [ip +m) and r) is a Gaussian ran- 
dom vector. Therefore, step (2) above can be replaced by a noisy Metropolis 
test [24] as: 

(2) Draw a Gaussian random vector i], and accept {[/', 0'} with probability 

min{l,e-\ w '- lw ^+W 2 ). 

This is a very economical way to enforce exactness of the algorithm. Since W is 
close to unity, W'~ 1 (Wri) can be computed in a few iterations of a linear solver. 
It is even possible to cut down on the number of iterations by first drawing the 
random number in the acceptance test [19]. In case W is difficult to express, as 
for instance for 1 flavor, where W = (If +m) 1 / 2 P n {Jp +m), it can be replaced 
by a high-quality polynomial approximation W ~ P^ x (Jf) + m) P n (If + m), 
where h ^> n [4]. A similar strategy is used in [5]. 

The average acceptance depends on the fluctuations of C'/C about 1. It can 
be predicted rather accurately as a function of n, m, and V, with a single- 
parameter ansatz accounting for the fluctuations of the gauge field (i.e., for 
the value of (3) [14]. As n varies, the acceptance behaves as e~ e so that it 
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changes rather abruptly from nearly zero to nearly 1. 



2.3 Reducing the degree of the polynomial by preconditioning 



The difficulty of approximating \jx increases with the width of the approx- 
imation interval. As can be seen already from Eq.(9), the degree n of the 
approximation must increase linearly with the "dynamic range" A max /A m j„ 
(= \je for Sec. 2.1.1) considered. Reduction of this dynamic range, or equiv- 
alently, of the condition number of the matrix to invert, may be achieved by 
preconditioning. Two simple and efficient ideas have been put forward. 



2. 3. 1 Even-odd preconditioning 

This preconditioning exploits the identity {M, £} = 0, where M is the Wilson 
(or staggered) hopping matrix, and S is the diagonal matrix with elements 
^x, y = (~ l) x &c,{/- It follows that 

det(l - kM) = det(l + kM) = det(l - n 2 M eo M oe ) , (14) 



where M eo hops from odd sites to even ones. The preconditioned matrix 
Jf) = 1 — K?M eo M oe is easier to invert than the original one, as is well known 
in the context of quark propagator calculations. Now the approximating poly- 
nomial Pn{Jp) factorizes into Ylki^P ~ -41); an d each monomial gives rise to a 
partial determinant det _1 (^ — £kl)\ip — -41). Turning this determinant into 
a Gaussian integral, one would expect a multiboson action \{Ip — f fc l)0 fc | 2 , i.e., 
with range 4. Fortunately, a simpler Gaussian integral can be used, because 



. 1 — x — nM eu , 
det(l — k M eo M oe — ifel) = det j , (15) 



-nM oe 1 - y 



provided (1 — x)(l — y) = 1 — £k- Thus, for the effective action one may use 
|(1 — Zk — nM)(pk\ 2 , where Zk takes value x on even sites and y on odd ones, 
satisfying the above equality. The simplest choice is x = y = 1 — (1 — Zk) 1 ^ 2 
[14]; alternatively, y — 0, x — z k has also been used [12]. Either way, the degree 
n of the approximation is reduced by a factor of 2 or more, with no overhead. 
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2.3.2 UV filtering 



This preconditioning makes use of the identity e det e = 1, so that 

det(l - kM) = e -^i ajTrMj x det ((1 - kM) e + ^ ajMj ) . (16) 



The number of non-zero coefficients a,j and their values can be adjusted and 
optimized freely. The idea is that the large Dirac eigenvalues, corresponding to 
UV Fourier modes, can be accounted for by the first factor e ~^j a ^ vM \ The 
polynomial P n (M) must then approximate (1 — kM)^ 1 e - ^'" 5 ^, where the 
UV modes have been exponentially damped, or "filtered out." The dynamic 
range of the approximation is thus reduced, and with it the degree of the 
polynomial approximation. There is no overhead for this preconditioning up 
to order M 4 : the first three terms give zero trace, and the fourth simply 
shifts the coupling f3 of the plaquette in the gauge action. The values of the 
coefficients dj can be optimized at the same time as the polynomial P n is 
determined (see [20] for details). This UV filtering amounts to removing from 
the determinant the first term(s) of its loop expansion. It has been observed 
that the major effect of dynamical quarks is to shift the gauge coupling f3, and 
not much else, down to rather light quarks [26]. Therefore, the preconditioning 
should be very effective. Indeed, the number of boson fields remains much 
smaller than the number of linear solver iterations necessary for HMC with the 
same parameters, and the memory bottleneck of the MB approach essentially 
disappears. Figure lc shows the zeroes of a degree-7 polynomial which provides 
the same accuracy as those of Figs, la and lb. Note how these zeroes are 
concentrated near the IR part of the spectrum. 
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Fig. 1. Three strategies to choose the zeroes of the approximating polynomial, all 
giving similar acceptances: from left to right, (a) non-hermitian Chebyshev poly- 
nomial (n = 20); (b) adapted polynomial (n = 16); (c) adapted polynomial with 
UV- filtering (n = 7). The small symbols, forming a contour, are an estimate of 
the boundary of the complex Dirac spectrum, obtained as a by-product from the 
iterative solver. 
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2.4 Implementation details 



As seen in Sec. 2.2, making the MB method exact involves applying the polyno- 
mial P n {Jp) to some vector v. Doing this accurately enough can be problematic 
on 32-bit machines like the Quadrics/APE, and much effort has been devoted 
to the study of round-off errors [23,25]. The lessons learned from these studies 
are: (i) whenever possible, avoid representing P n as a product of monomials. 
P n can be decomposed on a basis of orthogonal (esp. Chebyshev) polynomials, 
where at each stage the lower-degree polynomial is the best approximation to 
P n . The so-called Clenshaw recursion [27,25] gives particularly good control 
over round-off errors; (ii) on 32-bit machines, applying P n naively as a prod- 
uct of monomials will give unacceptable errors even for moderate n (20-30), 
unless careful ordering of the factors is performed. Montvay's ordering [23] 
and the bit-reversal scheme [25] appear to be best. 



3 Dynamics 

3. 1 Theory 

The theoretical understanding of MB dynamics is very crude. The situation is 
more complicated than with HMC, because one normally alternates single MC 
steps for the gauge fields and for the boson fields, so that the two dynamics 
are intimately coupled. Nevertheless, some simple facts derive from the form 
Eq.(5) of the MB effective action. 

(1) Each boson field contributes a harmonic piece to the potential seen by the 
gauge field, and this potential gets steeper with the number n of boson 
fields. Consequently, the step size of the gauge-field update decreases as 
1/n, and the autocorrelation time increases as n [9,8]. 

(2) When the gauge field is kept fixed, each boson species <pk has a mass which 
is the smallest singular value of (1-^-kM). This mass goes to zero as Zk 
approaches the boundary of the Dirac spectrum. This normally happens 
principally to the boson fields governing the IR part of the spectrum, 
which therefore have the slowest dynamics. 

It is thus clear that critical slowing down occurs in both the gauge and the 
boson sectors as the quark mass is decreased. Under plausible assumptions, 
Ref. [14] suggests an autocorrelation time ~ n/m z , where z — 1 — 2 is the 
dynamical critical exponent for the boson-field update. 
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3.2 Practical knowledge 



Because the MB action is local, a choice of standard local MC updates is 
available: Metropolis, heatbath, and overrelaxation, for both gauge and boson 
fields [pseudo-heatbath for 577(3) gauge fields]. A trajectory between succes- 
sive global accept/reject steps must consist of a reversible sequence of local MC 
steps (Sec. 2.2.2) Beyond this, there is complete freedom in choosing among 
the various updates. 

It has been observed that a judicious ratio of gauge to boson updates and of 
heatbath to overrelaxation can accelerate the dynamics by a factor of ~ 3 [12]. 
However, this tuning must presumably be performed anew for each value of 
the quark mass (and preferably for each boson species): no general rule has 
emerged. 

Also, since the frozen bosons prevent a large MC step for the gauge fields, a 
combined update (one link + all boson fields at its two ends) has been devised 
[8,12]. However, the gain in the dynamics is moderate, and is all but absorbed 
in the extra work per update. 

3.3 Implementation 

Since the MB action is local, it is trivial to divide the lattice into sets of 
decoupled variables, and to distribute the update of a set on a parallel ma- 
chine. MB programs have been written for the Quadrics/APE, the T3D/E, 
or based on MPI for portability. The minimum decomposition for Wilson 
fermions (r-yviison — 1) is into 8 sets (2 opposite points per 2 4 hypercube [14]). 
A more efficient organization is the "star" updating strategy [28]: all boson 
fields at a given site and all 8 gauge links attached to it are updated before go- 
ing to the next site. This allows reuse of intermediate results (e.g., the corners 
of the gauge plaquettes), thus reducing the total amount of work. And one 
can easily integrate analytically over the boson fields at the central site, which 
yields an effective gauge action allowing for a larger MC step. This provides 
advantages similar to those of the combined update of [8] above, without the 
overhead. 

3.4 Prospects 

Overrelaxation [29] is a remarkably efficient local update. There is little hope of 
improving on it. On the other hand, the action of each boson field is Gaussian. 
It is so simple that one may hope to accelerate the boson dynamics at low 
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computing cost. One avenue still not explored is to devise a cluster MC. A 
simpler one is to consider a global boson heatbath, achieved by assigning <pk = 
(1 — zj, — k,M)~ 1 i], with i] a Gaussian random vector. Theoretical arguments 
have been proposed, suggesting that such a global update is superior to a local 
one as m — > [30]. The recent proposal of a quasi-heatbath [31] (approximate 
global heatbath + accept /reject) reduces the cost of such a strategy and makes 
it worth exploring. 



4 Assessment of MB vs HMC 

4-1 Theoretical cost analysis 

A naive analysis of the cost of a MultiBoson simulation as a function of the 
volume V and the quark mass m goes as follows [14,30,32]. 

• The work per bosonic update step is proportional to V and to the number n 
of boson fields. The same holds for a gauge update step, because the gauge 
force ("staple") sums up n boson contributions. 

• The autocorrelation time grows as n/m z , with z = 1 — 2 (see Sec. 3.1). 

• The number of fields n grows as m~ l log V (see Sees. 2.1 and 2.2.2). 

The work per independent configuration thus grows as V(log V) 2 m~ 2 ~ z , to be 
compared with the HMC case: \/ 5 / 4 m" p , p = 13/4—7/2 [1,32]. It is likely that 
the boson dynamics have a dynamical exponent z = 2. In that case, the MB 
approach would scale somewhat better than HMC with respect to the volume, 
and rather worse with respect to the quark mass. This analysis is no more than 
plausible. But it agrees with practical observations: with MB simulations, one 
tends to pay a low price when increasing the lattice size, and a rather stiff one 
when decreasing the quark mass. 

4-2 Pros and cons of MB 

Some of the more or less well known virtues and shortcomings of the MB 
approach are the following. 

Pros: 

• As the fermion mass m increases, the number n of boson fields can be re- 
duced, and the MB dynamics approach quenched local dynamics (pseudo- 
heatbath and overrelaxation). This is in contrast to HMC, which approach 
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quenched molecular dynamics, measured to be almost two orders of mag- 
nitude slower [33]. Therefore, MB has to be much faster than HMC if the 
fermions are heavy enough. 

• The MB action is quadratic in the fermion mass m or in k. This allows for 
straightforward reweighting of the Monte Carlo results for [slightly] different 
masses. 

• The enlarged MultiBoson phase space facilitates the dynamics around ef- 
fective energy barriers. 

This last notion must be clarified. Consider the gauge force |^ in the MB 
action S = J2k 1(^(^0 + m ~~ z k)<Pk\ 2 - Each of the n terms in the derivative 
contains a different factor of 4>\4>k- These factors add up incoherently, in con- 
trast to HMC where there is only one factor of 4> <i>- This incoherent sum has 
the effect of smoothing the divergence of the gauge force in the vicinity of 
a Dirac zero mode. Equivalently, the added bosonic dimensions in the phase 
space provide the MB algorithm with many paths which "go around" the 
energy barrier, making its crossing easier. This may give a qualitative advan- 
tage to the MB method for moving through topological sectors at small quark 
masses. Similarly it is the decoupling of the various Fourier modes which makes 
UV filtering possible (Sec. 2.3.2). If one uses the same UV-filtered polynomial 
in the HMC action, the step size must be adjusted to track the fast dynamics 
of the UV modes, which makes the IR dynamics hopelessly slow. 

Cons: 

• MB needs more memory to store the bosonic fields. However, this no longer 
represents a real obstacle. With UV-filtering, simulations of large volumes 
at small masses can be performed with no more than (9(50) boson fields. 

• The MB approach is designed for the standard (Wilson or staggered) Dirac 
discretization. Any less local discretization makes it very difficult to use 
efficient local update algorithms for the gauge and boson fields. 

4-3 Actual performance comparison 

Performance comparisons with HMC are summarized in chronological order 
in the rightmost column of Table I. There is a slight trend for the advantage 
of MB to increase with time owing to successive algorithmic improvements. 
But several caveats are in order. 

(1) Extracting autocorrelation times with some accuracy is notoriously ex- 
pensive. Moreover, with MB simulations, there is a systematic effect by 
which unusual configurations, in the tail of the distribution, tend to be as- 
sociated with much longer autocorrelation times because the polynomial 
P n is a bad approximation on such background fields and the acceptance 
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(Sec. 2.2.2) is very small. Therefore, short MC runs tend to paint a rosy 
picture by underestimating the MB autocorrelation time. 

(2) The version of HMC serving for comparison does not always incorporate 
the complete set of "bells and whistles" (even-odd or SSOR precondi- 
tioning, BiCG solver, incomplete convergence or extrapolation of solution 
during MD trajectory [1]). Each of these technical refinements improves 
HMC performance by a factor 0(2). 

(3) As seen above, MB is a superior method when the fermions are heavy. 
So a meaningful question would be: how small a fermion mass does it 
take for HMC to win over MB, if at all? But, of course, light fermion 
simulations are even more expensive. 

(4) What is compared is usually the integrated autocorrelation time of the 
plaquette, expressed in applications of the Dirac matrix to a vector. Some- 
times, more relevant, larger-scale observables like the pion correlator have 
also been measured, and give rather similar results. However, a compar- 
ison of the decorrelation of the topological charge, yet to be performed, 
might present a different picture. 

To summarize, my subjective assessment of Table I is that MB is finally be- 
coming competitive with HMC in the interesting regime of quark masses. 



5 Conclusion 

The initial enthusiasm for the MB method has now abated. The general im- 
pression is that for light quarks, MB is roughly equivalent to HMC in terms of 
efficiency, and that research effort is better spent on improving the discretiza- 
tion of the Dirac operator. Indeed, one might argue that everything that can 
be done with MB can be done, perhaps better and more simply, within the 
HMC framework. 

MB replaces the adaptive inversion of the Dirac operator performed by the 
linear solver at each HMC step by a fixed approximation. This substitution 
can be performed directly in the HMC effective action [34]. While it leads to 
no special advantage for the simulation of 2 flavors, this replacement allows 
the exact simulation of any number of flavors , (with the same limitations as 
MB on the positivity of the Dirac eigenvalues [4]), and opens the possibility 
of biased sampling, e.g., of Dirac near-zero modes [35]. 

However, this dismissive statement is not completely true. The presence of 
multiple boson fields changes the dynamics significantly. It essentially replaces 
the deterministic, IR-driven dynamics of HMC by fast diffusive dynamics. This 
is a clear advantage for heavy fermions, although possibly a handicap for light 
ones. In addition, the multiple boson fields allow the breaking and separate 
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treatment of the various scales (IR and UV) of the Dirac operator, as in the 
UV-filtered version. A similar treatment is hopelessly inefficient within the 
HMC framework. 

Nevertheless, the strongest limitation of the MB approach appears to be its 
most highly praised characteristic: the action is local. Actually, the range of 
the boson interaction is twice that of the Dirac operator. This makes it cum- 
bersome to implement local update steps for any but the simplest Dirac dis- 
cretization (Wilson or staggered). As increasing attention is paid to reducing 
discretization errors, the Dirac operator becomes less local and the MB ap- 
proach loses much of its appeal. 

Therefore, the MB method at present is a "niche" algorithm. It is a superior 
choice for the simulation of heavy dynamical fermions. And the same algorithm 
"template" can be used to simulate exotic numbers of flavors or other unusual 
fermionic determinants. Further research efforts to accelerate MB dynamics 
may well broaden its appeal. 
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